Tags: hive lzo Today, when I sent an email to the microblogs data platform, they failed to run hql, but the logs on the gateway did not show the cause. I checked it for help, finally, the cause of the problem is found. The following is the analysis process: 1. Failed hql: Insert overwrite table brand_ad_user_with_interact_score_3 select. UID,. brand,. friend, Case B. weight when null then '0. 000000 'else B. weight endfrom brand_ad_2hop_3 Aleft
hours. For example, as seen in:According to the analysis of 1. The job does not store data skew from the top of the HQL, so why is there a single map execution time of more than 10 hours, looking at the Kill Map task counter information, such as the following:The single map task reads 10G of data from the HDFs. No, it shouldn't be. The data files that are processed are not fragmented, and a single map task processes a single large file. With this kind of push test, I went to check the hql insid
Lzo compression can be performed in parallel in multiple parts, and the decompression efficiency is also acceptable.
To cooperate with the Department's hadoop platform for testing, the author details how to install the required software packages for lzo on the hadoop platform: GCC, ant, lzo, lzo encoding/decoder, and
LZO usage and introductionLZO description Summary
LZO is a lossless compression library written in ansi c. He can provide very fast compression and decompression functions. Decompression does not require memory support. Even if a large compression ratio is used to compress the data slowly, the data can still be decompressed very quickly. LZO follows the gnu gpl l
Lzo Description SummaryLZO is a lossless compression library written in ANSI C language. He is able to provide very fast compression and decompression functions. Decompression does not require memory support. Even with very large compression ratios, the data that is slowly compressed can still be decompressed very quickly. LZO complies with the GNU GPL license.IntroducedThe
Https://cwiki.apache.org/confluence/display/Hive/LanguageManual+LZOLanguagemanual LZO Skip to end of metadata
Created by Lefty Leverenz, last modified on Sep
Go to start of metadata LZO Compression
LZO Compression
General LZO Concepts
Prerequisites
Main steps:
1. Install and update GCC and ant (if the system has been installed, skip the following steps)
Yum-y install GCC gcc-C ++ Autoconf automake
Wget
Http://labs.renren.com/apache-mirror//ant/binaries/apache-ant-1.8.2-bin.tar.gz
Tar-jxvf apache-ant-1.8.2-bin.tar.bz2
Export ant_home =/usr/local/Apache-ant-1.8.2
VI/etc/profile
Source/etc/profile
Export Path = $ path: $ ant_home/bin
2. Install lzo on each node
Wget
Http://www.oberhumer.com/opensou
/** * @author HJX * @version 1.0,2013-01-16 * @since jdk1.7,ubuntu-12.04-64bit * Run in Hadoop environment * writes a string to the local Lzo file (
Not Hadoop HDFs) * Then read from the Lzo file and proofread with the original string/import Java.io.BufferedReader;
Import Java.io.FileInputStream;
Import java.io.FileNotFoundException;
Import Java.io.FileOutputStream;
Import java.io.IOException;
Import Java.i
Using the Lzo compression algorithm in Hadoop can reduce the size of the data and the disk read and write time of the data, not only that, Lzo is based on block block, so he allows the data to be decomposed into chunk, parallel by Hadoop processing. This allows Lzo to become a very useful compression format on Hadoop.Lzo itself is not splitable, so when the data
Today, I tried to install and configure Lzo on the Hadoop 2.x (YARN), encountered a lot of holes, the information on the Internet is based on Hadoop 1.x, basically not for Hadoop 2.x on the application of Lzo, I am here to record the entire installation configuration process
1. Install Lzo
Download the Lzo 2.06 versi
Reference http://blog.csdn.net/lalaguozhe/article/details/10912527 Environment: hadoop2.3cdh5.0.2 Hive 1.2.1 Target: Install Lzo Test job run with Hive table creation using LZO format store Before installing the trial snappy, it was found that the CDH extracted native contains a local library such as Libsnappy, but does not contain lzo. Therefore, the use of
Lzo is a kind of data compression dedicated to decompression speed.AlgorithmLzo is the abbreviation of Lempel-Ziv-oberhumer. This algorithm is a lossless algorithm. For more information, see implementation.ProgramIs thread-safe.
Lzop is a free software tool to implement it. The original library was written in ansi c and published under the GNU General Public License. Currently, lzo is available in various
In recent days to verify the next Lzo This compression mode, has the following feeling:
Recently Lzo use problem, found Java.library.path setup problem, many online write is in hadoop-env.sh file add Java_library_path this attribute (about another increase Hadoop_classpath is valid , it is true that the jar package under the Lib directory is not automatically loaded when this hadoop-0.20.205.0 version does
1. Hadoop supports Lzo compression dependencies:The Lzo:unix/linux system does not have a Lzo library by default, so it needs to be installed, sudo yum install lzo-devel.x86_64sudo yum install lzo.x86_64 sudo yum install lzop.x86_642. Prepare MAVEN,ANT,GCC, etc.3. Compiling Hadoop-lzoDownload from Https://github.com/twitter/hadoop-
1. Read Lzo fileYou need to add the following code and import the Lzo related jar packageJob.setinputformatclass (Lzotextinputformat.class);2. Write Lzo fileLzo format is not supported by default splitable, you need to add an index file for it to support multiple map parallel to the Lzo file processingIf you want the r
Use protocolbuffer with lzo in hadoop (2) 1. lzo Introduction
Lzo is a type of encoding with high compression ratio and extremely high compression rate. It features
The decompression speed is very fast.Lzo is lossless compression, and the compressed data can be accurately restored.Lzo is block-based and allows data to be split into chunks, which can be decompres
Use lzo in hive 1 startup hive error exception in thread "Main" Java. lang. noclassdeffounderror: ORG/Apache/hadoop/hive/CONF/hiveconf at java. lang. class. forname0 (native method) at java. lang. class. forname (class. java: 247) at Org. apache. hadoop. util. runjar. main (runjar. java: 149) solution: When hadoop is installed, add the hadoop_classpath variable to overwrite the original variable when/CONF/hadoop-env.sh under the hadoop directory
Before establishing the map job, the text file is merged into the Lzo file by Combineinputformat, and the job settings are: Conf.setint ("Mapred.min.split.size", 1); conf.setlong ("Mapred.max.split.size", 600000000); 600MB, make each compressed file 120MB around conf.set ("Mapred.output.compression.codec", "
Com.hadoop.compression.lzo.LzopCodec ");
conf.set ("Mapred.output.compression.type", "BLOCK");
conf.setboolean ("Mapred.output.compress
); Lzoindexer.index (New Path (args[1])); System.exit (result); } else if (result = = 1 ) {system.exit (result); } }If you already have a Lzo file, you can add an index in the following ways:Bin/yarn jar/module/cloudera/parcels/gplextras-5.4.0-1.cdh5.4.0.p0.27/lib/hadoop/lib/ Hadoop-lzo-0.4.15-cdh5.4.0.jar com.hadoop.compression.lzo.distributedlzoindexer/user/hive/warehouse/cndns.db/ Od
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.